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ABSTRACT 

The derivation of cosmological parameters from astrophysical data sets routinely 
involves operations counts which scale as 0(N 3 ) where N is the number of data points. 
Currently planned missions, including MAP and Planck, will generate sky maps with 
Nj = 10 6 or more pixels. Simple "brute force" analysis, applied to such mega-pixel 



data, would require years of computing even on the fastest computers. We describe an 



algorithm which allows estimation of the likelihood function in the direct pixel basis. 



CSj ■ The algorithm uses a conjugate gradient approach to evaluate \ 2 an d a geometric 

approximation to evaluate the determinant. Monte Carlo simulations provide a 

Q\ . correction to the determinant, yielding an unbiased estimate of the likelihood surface 

in an arbitrary region surrounding the likelihood peak. The algorithm requires 
0(N d ) operations and O(Nd) storage for each likelihood evaluation, and allows 
for significant parallel computation. 
Subject headings: methods: data analysis - cosmic microwave background 
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1 Introduction 

Recent advances in instrumentation have transformed observational cosmology from 
a field starved for data to one where data management is fast becoming a limiting 
factor. Prior to 1992, for example, there were no detections of anisotropy in the 
cosmic microwave background (CMB). The Cosmic Background Explorer opened the 
floodgates with maps containing 10 4 pixels, while scheduled missions such as MAP 
and Planck plan for 10 6 or more pixels. While highly desirable from a scientific 
standpoint, the explosive growth of cosmological data sets carries the risk that their 
sheer size will hamper analysis through computational limits on existing or planned 
computers. 

Maximum likelihood methods are commonly used for parameter estimation 
with maps of the cosmic microwave background. For a multivariate Gaussian 
distribution, the probability of obtaining Nj, data points A, given a set of model 
parameters p is 

£ = P(A\ P ) = (2.)-^ 6X ^ 2) (1) 

where 

N d N d 

X 2 = Y,Y.MM-%A 3 , (2) 

t=l j=l 

is a goodness-of-fit statistic and My is the N^ x N^ covariance matrix. The "best" 
choice of parameters po is that which maximizes the likelihood function L. The 
curvature of the likelihood surface about the maximum defines the uncertainty in the 
fitted parameters, 



fy > V( F -%- ( 3 ) 

where 

dpidpj 

is the Fisher information matrix and L = — log(£) (see Bunn & Sugiyama 1995; 
Vogeley & Szalay 1996; Tegmark et al. 1997; Bond et al. 1998). 

The maximum likelihood estimator is unbiased and asymptotically approaches 
the equality in Eq. ||. However, these advantages come at a steep price: computation 
of both x 2 an d the determinant |M| in Eq. [I] scale as 0(N%) operations, making 
brute-force calculation computationally infeasible. For large data sets (N^ > 10 6 ) the 
time required is measured in years, even on the most powerful computers. 

Conjugate gradient techniques provide half of the solution. We may reduce 
the operations count of the \ 2 calculation by computing the vector 

z = M _1 A 



in Eq. |2], effectively trading the cost of a matrix inversion for the requirement of 
re-calculating z for every different sky map A. If the the matrix M is dominated 
by the diagonal elements (or is pre-conditioned to be suitably close to the identity 
matrix) an iterative solution for z can be found in O(Nj) operations (Press et al. 1992 
and references therein; Oh, Spergel, & Hinshaw 1999). The determinant calculation, 
however, is less tractable. A number of authors have suggested ways around this 
problem. Karhunen-Loeve eigenvalue techniques produce moderate data compression, 
reducing the Nj original data points to N' ~ A^/10 eigenmodes (Bond 1994; Bunn & 
Sugiyama 1995; Tegmark et al. 1997). However, estimating cosmological parameters 
from the smaller set of eigenmodes still scales as (N') 3 operations, making such 
techniques undesirable for mega-pixel data sets. 

Oh et al. (1999) derive a method for likelihood evaluation using a Newton- 
Raphson quadratic iteration scheme. The determinant is first approximated using 
azimuthal symmetry of the noise matrix (appropriate for full-sky CMB maps), then 
corrected using Monte Carlo simulations. The method provides a nearly minimum- 
variance estimate of the angular power spectrum for CMB anisotropy maps in 0(N d ) 

operations and 0(N d ' ) storage; cosmological parameters can then be derived by 
comparing the power spectrum to various models. Although this algorithm is fast 
enough for mega-pixel data sets, it is optimized to estimate the power spectrum, 
rather than the underlying cosmological parameters. When used as a root-finding 
technique in parameter space, it requires a sufficiently good starting estimate to 
guarantee convergence to true maximum. The radius of convergence in parameter 
space is small and the problems associated with parameter covariance become severe. 

Borrill (1998) offers a global solution to bound the likelihood. This method 
uses Gaussian quadrature to bound the likelihood at any point in parameter space 
(not just near the likelihood maximum); it is thus well suited to search parameter 
space using the direct pixel basis. However, the method requires 0(N d ' ) operations 
for each likelihood evaluation and is thus significantly slower than the method of Oh 
et al. (1999). More importantly, it can only provide bounds on the likelihood, fixing 
log(£) to accuracy of a few percent. Since log(£) > N d (a large number), errors of a 
few percent can create significant bias in the location of the likelihood maximum. 

This paper describes an algorithm to estimate cosmological parameters from 
mega-pixel data sets using maximum-likelihood techniques on pixelized sky maps. It 
uses a conjugate gradient algorithm to evaluate x 2 an d a geometric approximation 
to evaluate the determinant. Monte Carlo simulations provide a correction to the 
determinant, yielding an unbiased estimate of the likelihood surface in an arbitrary 
region surrounding the likelihood peak. The algorithm requires 0(N d ) operations 
and O(Nd) storage for each likelihood evaluation, and allows for significant parallel 
computation. 



2 Likelihood Evaluation 

Let the input data consist of a map of the microwave sky - a vector consisting of 
temperature differences Aj evaluated at Nd pixels on the sky. The temperature in 
each pixel consists of a cosmological signal plus instrument noise, Aj = Si + n*, 
where we ignore for now the question of contaminating foreground signals (galactic 
and extragalactic emission). It is convenient to expand the CMB signal in spherical 
harmonics, 

s(fl) = Y^ ae m Y em (Q). (5) 

hn. 

Inflationary models predict the a^ m to be random Gaussian variables whose variance 
depends only on angular scale l and not on m. The co variance matrix My is thus 

My = (AA-) = -^ E( 2 ^ + l)W?C e P e (n t ■ rij) + jU,, (6) 

where Wf is the experimental window function that includes the effects of beam 
smoothing and finite pixel size, Pi(rii ■ rij) is the Legendre polynomial of order £, rii is 
the unit vector towards the center of pixel i, N° hs is the number of observations for 
pixel i, and Q is the angular power spectrum 

(demO-e'm') = CeSw5 mm '- 
We wish to evaluate the function 

L = -log(£)ocx 2 + log(|M| ) 
to derive the set of parameters p = [Q, flj,, H , A, ...] which minimize L. 

2.1 Chi- Squared Evaluation 

Conjugate gradient techniques, to be efficient, require the matrix M to be suitably 
close to the identity matrix. For CMB maps, this is almost always the case. To 
see this, note that My = (AjAj) = C(%) where C(9) is the two-point correlation 
function. Since C(0) > C{9) for all 9 > 0, the matrix must satisfy 

M kk > M lk , i^k (7) 

A particularly simple iterative method for the x 2 evaluation can be derived as follows. 
We define a residual vector 

r = Mz- A (8) 

which by definition is zero when z is solved exactly. Now suppose that we have a 
guess which differs from the exact solution by amount 5z in the k th element, with 



corresponding residual vector r, = Sz M^, that is, the scalar Sz multiplied by the k th 
row of the covariance matrix. But from Eq. [^ we may identify k as the position of 
the largest element of the vector r, and apply corrections 



z k — z k — Sz 

ri = n- Sz M ik 



(9) 



where 



Sz = Max(i?) / M 



kk 



(10) 

For the general case where more than one element of z is incorrect, we may proceed 
iteratively, using Eqs. |H] and |J] until the largest element in the residual vector falls 
below some pre-determined threshold. 

Figure [I] shows the convergence of x 2 f° r a simulated sky map and vector z 
evaluated using the algorithm above for the case where M correctly describes the 
input map. The convergence is rapid. Each iteration requires 0(Na) operations (the 
vector multiplication in Eq. ||) while the entire algorithm converges in Nj log(A^) 
iterations. 

2.2 Determinant Approximation 

The determinant calculation in Eq. [TJ scales as 0(N%) operations, making direct 
evaluation prohibitively expensive. We make a first approximation to |M| by noting 
that the elements My depend only on the angular separation between pixels i and j. 
For a rotationally invariant signal, rows Mjfc and TS/Ljk thus consist of the (nearly) the 
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Figure 1: Convergence of \ 2 evaluated using the conjugate-gradient algorithm 

on a simulated CMB map with 64720 pixels. 



same numbers, simply repeated in different order according to the pixelization. All 
of the information in the covariance matrix is contained in the two-point correlation 
function C(9) - the covariance matrix merely samples C(9) at a set of discrete values, 
then orders the values according to the pixel scheme. We thus make the ansatz that 

log(|M|) = alog(|M'|) (11) 

where the covariance matrix M' is formed from a subset of the pixels in the original 
sky map, chosen to sample the correlation function with the same angular distribution 
as the full set of pixels. If there are N' pixels in the subset, computing |M'| costs 

(iV) 3 operations. We thus restrict ourselves to N'lN d so that the entire calculation 
scales as iVj or faster. 

The subset of N' pixels from the original sky map can be selected several ways. 
A random set of pixels will, on average, match the distribution of angular separations 
9ij in the original sky map. However, sample variance becomes important for small 
subsets and can over- or under-represent some separations in a single realization. 
While this would average out over many realizations, we are only allowed a single 
realization in the likelihood evaluation. We can overcome this by using subset pixels 
in a fixed geometric orientation. For full-sky maps, we obtain satisfactory results 
using a set of three interlocking great circles. Other configurations can be devised for 
different map geometries. Once the subset of N' pixels is selected, the determinant 
|M'| can be computed using standard techniques. For the great-circle geometry, the 
number of subset pixels scales as N' oc y/Wd- The determinant |M'| thus requires 

0(N d ) operations and 0(Nd) storage. 

To derive the scale factor a, we again take advantage of the fact that for CMB 
maps the matrices M and M' are suitably close to diagonal. For diagonal matrices, 

"" = E,log(M<„)- (12) 

Off-diagonal elements produce corrections to this simple relation. For the CMB maps 



expected from the MAP and Planck surveys, however, Equation [12] is accurate to a 
few percent. We have tested this simple scaling for maps ranging from 384 to 6144 
pixels, the largest value for which the full determinant can readily be evaluated. The 
simplistic approximation is accurate to a few percent over a wide range of parameter 
space and map size. 

2.3 Monte Carlo Correction 



Equations |TT| and [L^ allow efficient evaluation of the logarithmic likelihood 

L = x 2 + a|M'| (13) 

6 



at any point in parameter space to an accuracy of ~2% for computational cost 
0(N d ' ) and storage 0(N d ). It is thus well-suited for the initial stages of parameter 
estimation in the direct pixel basis. In this scheme, we begin with some initial 
choice of parameters and use the methods outlined above to evaluate the likelihood 
in the neighborhood of the initial parameters. Using established minimization search 
techniques (see, e.g., Press et al. 1992) we then step through parameter space toward 
the global likelihood maximum. 

The simple approximation to L differs from the true value by a few percent 
in a smoothly-varying fashion. It thus returns a biased estimate of the global 
maximum, but does not create secondary (local) maxima. However, the initial stages 
of a parameter search overwhelmingly involve the rejection of bad solutions for the 
parameter values in favor of better ones; as such, we can afford to sacrifice some initial 
accuracy for computational speed. Once we have found the approximate location of 
the global maximum, we use Monte Carlo simulations to iterate toward an exact 
solution for the likelihood surface in a much smaller region of parameter space. 

Monte Carlo techniques reverse the sense of the likelihood evaluation: instead 
of using Eqs. [12] and [13] to evaluate the likelihood for a map with unknown 
cosmological parameters, we use a set of simulated maps generated from a single 



fixed choice of parameter values and solve Eq. [13] for the unknown value a for which 
the likelihood correctly peaks at the known parameter values. That is, we generate a 
set of m simulated sky maps from a single set of parameter values po. For each map, 
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Figure 2: Histogram of the scale factor a derived from Eq. [13] using 300 Monte 

Carlo simulations. The dotted line indicates the value from direct evaluation of the 
determinants |M| and |M'|. The simulations shown correspond to input parameters 
Q = 0.7, n = 1.0 and normalization Q = 18 //K, but the ability to recover the correct 
scaling a does not depend on the location in parameter space. 



we evaluate x 2 at a grid of parameter values centered on the input parameters (in 
practice, we find that varying a single parameter by a few percent provides acceptable 
results). We then use the x 2 grid and determinants |M'| (which do not depend on 
the map values) to solve Eq. [13| for the scale factor a by requiring that log(£) peak 
at the input parameter value. 

The scale factor a is a weak function of the parameter values. The fitted 
value a from each simulation is thus exact only at the parameter values Pq, becoming 
increasingly poor at grid points further removed fromp - However, since the simulated 
maps correspond to the parameter choice po, the likelihood function must peak exactly 
at po and the process by construction is quadratically convergent. 

We test the algorithm using simulated sky maps with 6144 pixels using different 
choices for sky cut, instrument noise, and subset pixel selection. We generate each 
simulated map using Eq. |5] with random variables a^ m corresponding to a COBE- 
normalized cold dark matter model with £ < 100 and 3° beam smoothing. To each 
sky we add instrument noise following either the DMR noise pattern (Bennett et al. 
1996) or spatially uniform noise of the same mean amplitude. For each realization we 
compute the determinant | M' | and evaluate a using both the great circle and random 
pixel geometries on both the full sky map or the high-latitude portion \b\ > 20°. 

Figure |2| shows a histogram of the recovered values from m = 300 Monte Carlo 
realizations for the full sky map with uniform noise and the great circle geometry. The 
recovered value, a = 5.504 ± 0.006, agrees well with the exact value (5.5036) derived 
by direct evaluation of the determinants |M| and |M'|. Since the method involves no 
integrals over spatial functions it is completely insensitive to sky coverage. Allowing 
the noise to vary with position on the sky increased the scatter in the recovered values 
for a but did not bias the results: 300 simulations using COBE noise with \b\ > 20° 
and random pixels obtained a = 7.07±0.04 compared to the exact value 7.02 for 4016 
high-latitude pixels. The convergence is rapid: One round of Monte Carlo realizations 
usually suffices to derive an accurate likelihood for a single point in parameter space. 



3 Discussion 

The determinant approximation requires 0(N d ) operations and O(Nd) storage for 
each point evaluated in parameter space. Conjugate gradient evaluation of x 2 requires 
0(m iVj) operations, where m is the number of Monte Carlo realizations. During 
the initial parameter search, Monte Carlo simulations are turned off and m — 1. 
Once an initial maximum is located, m « 50 realizations locate the true maximum 
to within 1% of the minimum variance limit. With efficient pre-conditioning (Oh 
et al. 1999), 100 x 2 evaluations for a 10 6 pixel map can be run in 2 hours elapsed 
time. Since the determinant depends only on the model parameters and not on the 
map temperatures in each realization, it need be computed only once for each set of 
parameter values. Timing tests using a single CPU at 400 MHZ demonstrate that the 



total elapsed time required to calculate |M'| scales as iVj- 55 including all overhead. 
Computing the determinant for a map with Nj = 10 6 pixels requires 24 hours elapsed 
time on a single CPU. The entire computation thus has one term that scales as 
At ~ 2 hours (iV d /i0 6 ) 2 and a second term scaling as At ~ 24 hours (iV^/lO 6 ) 3 / 2 . 
For Nd < 10 8 pixels, the determinant is the slower of the two and the entire 
computation scales as N d . 

The likelihood method described above has several advantages. By 
construction, the Monte Carlo simulations converge to the correct solution. Since 
the Monte Carlo algorithm evaluates the likelihood at parameter values close to the 
peak likelihood, it automatically provides the local curvature (Fisher information 
matrix) at no additional computational cost. The technique is well suited for 
the modest parallel processing provided by multiple-CPU work stations currently 
available. Finally, the technique does not depend on assumptions of any symmetry 
on the sky and can be used for any sky map regardless of the shape of the observed 
region. Asymmetric maps (e.g., a cut map that excludes bright regions/point sources 
or a partial map from balloon surveys) may be evaluated using the techniques outlined 
above. 
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